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Nothing Found 

Your search for +abstract:load +abstract:executable + abstract: modify 
+abstract:substitute did not return any results. 

You may want to try an Advanced Search for additional options. 

Please review the Quick Tips below or for more information see the Search Tips . 

Quick Tips 

• Enter your search terms in Jower case with a space between the terms. 

sales offices 

You can also enter a full question or concept in plain language. 

Where are the sales offices? 

• Capitalize proper nouns to search for specific people, places, or 
products. 

John Colter, Netscape Navigator 

• Enclose a phrase in double quotes to search for that exact phrase. 

"museum of natural history" "museum of modern art" 

• Narrow your searches by using a + if a search term mysLappear on a 



museum +art 

• Exclude pages by using a - if a search term must not appear on a page, 
museum -Paris 

Combine these techniques to create a specific search query. The better 
your description of the information you want, the more relevant your 
results will be. 

museum + n natural history" dinosaur -Chicago 
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1 Experioient.manage 

Karen L. Karavanic, Barton P. Miller 

November 1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: * ^ pdf(69.59 KB) Additional Information: full citation, abstract, references, citings 

The development of a high-performance parallel system or application is an evolutionary 
process. It may begin with models or simulations, followed by an initial implementation of 
the program. The code is then incrementally modified to tune its performance and 
continues to evolve throughout the applications's life span. At each step, the key question 
for developers is: how and how much did the performance change? This question arises 
comparing an implementation to models or simu ... 



2 Esse:„an.enyiron.m„ | 
Jack W. Davidson, David B. Whalley 

April 1990 ACM SIG METRICS Performance Evaluation Review , Proceedings of the 1990 
ACM SIG METRICS conference on Measurement and modeling of computer 

Systems, Volume 18 Issue 1 

Full text available: f| pd«220.72 KB) Additional lnfo| - mati ° n: Mixtion, abstract, references, citings, .index 

^ * * terms 

Gathering detailed measurements of the execution behavior of an instruction set 
architecture is difficult. There are two major problems that must be solved. First, for 
meaningful measurements to be obtained, programs that represent typical work load and 
instruction mixes must be used. This means that high-level language compilers for the 
target architecture are required. This problem is further compounded as most architectures 
require an optimizing compiler to exploit their capabilities. Bu ... 

3 CoQfj.gu.rabje.appl.ic | 
Griffith Hamlin, James D. Foley 

April 1975 ACM SIGGRAPH Computer Graphics , Proceedings of the 2nd annual 

conference on Computer graphics and interactive techniques, volume 9 issue 1 
Full text available: ^ fxif(160.73 KB) Additional Information: full citation , abstract , references, citings 

This paper reports on CAGES, a programming system which substantially simplifies the 
process of writing interactive graphics application programs for use in a distributed 
processing, satellite-host configuration. It allows programs written in a PL/I subset to be 
configurable: program modules . (main program, subroutines) and data can be easily 
reassigned from the host to the satellite, or vice versa. That is, the division of labor 



http: //portal, acm. org/results. cfm?CFID=55074663&CFTOKEN=31695420&adv=l&C. . . 9/19/05 



Results (page 1): +abstract : load +abstract: executable ^abstract: modify Page 2 of 6 



between the two computers is readily modified .The CAGES system su ... 

4 M§niQJ&hj^ 

address and cache coordinate 

Byung-Kwon Chung, Jinsuo Zhang, Jih-Kwon Peir, Shih-Chang Lai, Konrad Lai 
December 2001 Proceedings of the 34th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available:. 



|pdf{1.38 M8)i oJ Additional Information: full citation, abstract, references, citings 

PybJjsher.Site 

An increasing cache latency in future processors incurs profound performance impacts in 
spite of advanced out-of-order execution techniques. In this paper, we describe an early 
address resolution mechanism that accurately resolves both regular and irregular load 
addresses. The basic idea is to build dynamic dependence links from the instruction that 
updates the base register to the consumer load instructions. Once a new base address is 
available, it triggers calculations of the new load addresse ... 



5 



Exploiting Java instruction/thread level paraljeiism with horizontal multithreading 
Kenji Watanabe, Wanming Chu, Yamin Li 

January 2001 Australian Computer Science Communications , Proceedings of the 6th 
Australasian conference on Computer systems architecture ACSAC '01, 

Volume 23 Issue 4 

Full text available: ^ pdf(767.34 KB) Additional Information: full citation, abstract , references 

Java bytecodes can be executed with the following three methods: a Java interpreter 
running on a particular machine interprets bytecodes; a Just-In-Time (JIT) compiler 
translates bytecodes to the native primitives of the particular machine and the machine 
executes the translated codes; and a Java processor executes bytecodes directly. The first 
two methods require no special hardware support for the execution of Java bytecodes and 
are widely used currently. The last method requires an embedded J ... 

David Rochberg, Garth Gibson 

December 1997 ACM SIG METRICS Performance Evaluation Review, volume 25 issue 3 
Full text available: ^ pdf(1.09 MB) Additional Information: full citation, abstract, citings , index terms 

We discuss CTIP, an implementation of a network filesystem extension of the successful TIP 
informed prefetching and cache management system. Using a modified version of TIP in 
NFS client machines (and unmodified NFS servers). CTIP takes advantage of application- 
supplied hints that disclose the application's future read accesses. CTIP uses these hints to 
aggressively prefetch file data from an NFS file server and to make better local cache 
replacement decisions. This prefetching hides disk latenc ... 

7 Low.|)ower.mem 
improving performance 
Jia-Jhe Li, Yuan-Shin Hwang 

August 2005 Proceedings of the 2005 international symposium on Low power 
electronics and design ISLPED '05 

Full text available: ^pdf(iM-5§-KBJ Additional Information: ML-itation, abstract, references, index .terms 

As transistors keep shrinking and on-chip data caches keep growing, static power 
dissipation due to leakage of caches takes an increasing fraction of total power in 
processors. Several techniques have already been proposed to reduce leakage power by 
turning off unused cache lines. However, they all have to pay the price of performance 
degradation. This paper presents a cache architecture, the Snug Set-Associative (SSA) 
cache, that does not only cut most of static power dissipation ... 
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Keywords: leakage power, set-associative caches 



8 informed multi-process prefetching and caching 
Andrew Tomkins, R. Hugo Patterson, Garth Gibson 

June 1997 ACM SIG METRICS Performance Evaluation Review , Proceedings of the 
1997 ACM SIG METRICS international conference on Measurement and 
modeling of computer systems, volume 25 issue i 

Full text available- fa P dffZ90MB) Additional Information: full citation , abstract, references, citing, index 
^ * terms 

Informed prefetching and caching based on application disclosure of future I/O accesses 
(hints) can dramatically reduce the execution time of I/O-intensive applications. A recent 
study showed that, in the context of a single hinting application, prefetching and caching 
algorithms should adapt to the dynamic load on the disks to obtain the best performance. 
In this paper, we show how to incorporate adaptivity to disk load into the TIP2 system, 
which uses cost-benefit analysis to allocate g ... 

9 Maintaining Consistency and Bounding Capacity of Software Code Caches 
Derek Bruening, Saman Amarasinghe 

March 2005 Proceedings of the international symposium on Code generation and 
optimization CGO '05 

Full text available: ^.pdg253 ,55 KB) Additional information: TuJlcjtaiipn, abstract 

Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool 
platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. 
Caching frequently executed fragments of code provides significant performance boosts, 
reducing the overhead of translation and emulation and meeting or exceeding native 
performance in dynamic optimizers. One disadvantage of caching, memory expansion, can 
sometimes be ignoredwhen executing a single application. However, as optimiz ... 

10 Code 

Gadi Haber, Moshe Klausner, Vadim Eisenberg, Bilha Mendelson, Maxim Gurevich 
March 2003 Proceedings of the international symposium on Code generation and 
optimization: feedback-directed and runtime optimization CGO '03 

Full text available* l P3 rHfi'1 "1 MB) Additional Information: full citation, abstract , references , citings, index 
' " ' — 1 terms 

Memory access has proven to be one of the bottlenecks in modern architectures. Improving 
memory locality and eliminating the amount of memory access can help release this 
bottleneck. We present a method for link-time profile-based optimization by reordering the 
global data of the program and modifying its code accordingly. The proposed optimization 
reorders the entire global data of the program, according to a representative execution rate 
of each instruction (or basic block) in the code. The da ... 

11 Eliminating receive livelock in an interrupt-driven kernel 
Jeffrey C. Mogul, K. K. Ramakrishnan 

August 1997 ACM Transactions on Computer Systems (TOCS), Volume 15 Issue 3 

Full text available- c-st 552 40 Additional Information: fyJIeitatjon, abstract, Merences, cjtincss, index 

^ terms, review 

Most operating systems use interface interrupts to schedule network tasks. Interrupt-driven 
systems can provide low overhead and good latency at low offered load, but degrade 
significantly at higher arrival rates unless care is taken to prevent several pathologies. 
These are various forms ofreceive livelock, in which the system spends all of its time 
processing interrupts, to the exclusion of other necessary tasks. Under extreme conditions, 
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no packets are delivered to the user ... 

Keywords: interrupt-driven kernel, livelock, polling, scheduling 



12 Pthreadsjor.dynanijc 

Girija J. Narlikar, Guy E. Blelloch 

November 1998 Proceedings of the 1998 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: htmU82.60 KB) Additional Information: full citation, abstract, references, citings 

High performance applications on shared memory machines have typically been written in a 
coarse grained style, with one heavyweight thread per processor. In comparison, 
programming with a large number of lightweight, parallel threads has several advantages, 
including simpler coding for programs with irregular and dynamic parallelism, and better 
adaptability to a changing number of processors. The programmer can express a new 
thread to execute each individual parallel task; the implementation dyn ... 

Keywords: Pthreads, dynamic scheduling, irregular parallelism, lightweight threads, 
multithreading, space efficiency 




13 Automatic resumption mechanism for program debugging 
Takao Shimomura 

October 1991 ACM SIGSOFT Software Engineering Notes, volume 16 issue 4 
Full text available: ^fidf(M2.^KB) Additional Information: Ml citation, abstract |ndex.tenris 

In program debugging, tracing control instructions that examine internal program states 
can be saved in a file and this file can be used to initialize the debugging environment when 
a program is loaded by a debugger. When a source program is modified because of bugs, 
however, source line numbers are also changed. It is therefore necessary to update the 
source line numbers in a tracing control instruction file according to the modifications in the 
source program. This paper proposes a solution by ... 




14 Accelerators;.. 

Atsushi Kosaica, Satoshi Yamaguchi, Hiroyuki Okuhata, Takao Onoye, Isao Shirakawa 
April 2004 Proceedings of the 1st conference on Computing frontiers 

Full text available: ^pdf(644.59 KB) Additional Information: full citation , abstract, references, index terms 

This paper presents an ARM-based SoC architecture for the Ogg Vorbis audio decoder. A 
trivial software-based implementation incurs high computational cost and requires high 
operation frequency. In order to achieve realtime processing and efficient bus interface 
design for our target system, the load of an embedded processor is reduced through the 
use of specific hardware for a functional block that has higher computational complexity 
than other blocks of Ogg Vorbis decoding process. Based on com ... 

15 Energy efficient architectures: Reducing power requirements of instruction scheduling 
through. dynamic.M 

Dmitry Ponomarev, Gurhan Kucuk, Kanad Ghose 

December 2001 Proceedings of the 34th annual ACM/IEEE international symposium on 

Microarchitecture 

Full text available: r , Mm |U 

^paiU.:.v>j..!VlP.)..rtJt' Additional Information: full.c.Satio.a, abstract, references, citings 

Publisher Site 

The "one-size-fits-all" philosophy used for permanently allocating datapath resources in 
today's superscalar CPUs to maximize performance across a wide range of applications 
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results in the overcommitment of resources in general. To reduce power dissipation in the 
datapath, the resource allocations can be dynamically adjusted based on the demands of 
applications. We propose a mechanism to dynamically, simultaneously and independently 
adjust the sizes of the issue queue (IQ), the reorder buffer (R ... 

Keywords: dynamic instruction scheduling, energy-efficient datapath, power reduction, 
superscalar processor 



16 Cost-based query scrambling for initial delays 
Tolga Urhan, Michael J. Franklin, Laurent Amsaleg 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data, Volume 27 issue 2 
Full text available- f £hxlfM.8i ME*) Additional Information: MLcitation, abstract, references, sitings, index 



terms 

Remote data access from disparate sources across a wide-area network such as the 
Internet is problematic due to the unpredictable nature of the communications medium and 
the lack of knowledge about the load and potential delays at remote sites. Traditional, 
static, query processing approaches break down in this environment because they are 
unable to adapt in response to unexpected delays. Query scrambling has been proposed to 
address this problem. Scrambling modifies query execution plans o ... 

17 A bliM.suj^^^ 
Mark Nuttail 

October 1994 ACM SIGOPS Operating Systems Review, volume 28 issue 4 

Full text available: ?xif(1.19 MB) Additional Information: full citation, abstract , citings, index terms 

Migration is the movement of an active entity from one machine to another during 
execution. Such migration may be used for dynamic load balancing purposes with the aim 
of gaining increased performance from a group of processors than may be gained by 
schemes simply allocating processes to processors at run time. Schemes providing object 
migration also offer object persistence, improved fault tolerance and potentially more 
efficient remote object invocation (RPC).The survey covers systems providin ... 

18 Generating instruction sets and microarchitectures from applications | 
Ing-Jer Huang, Alvin M. Despain 

November 1994 Proceedings of the 1994 IEEE/ACM international conference on 
Computer-aided design 

Full text available- "BJpdff751 49 KET, Additional information: fuj! citation, abstract, references, citinc^ index 
^ * terms 

The design of application-specific instruction set processor(ASIP) system includes at least 
three interdependent tasks: microarchitecture design, instruction set design, and instruction 
set mapping for the application. We present a method that unifies these three design 
problems with a single formulation: a modified scheduling/allocation problem with an 
integrated instruction formation process. Micro-operations (MOPs) representing the 
application are scheduled into time steps. Instructions ... 

1 9 Efficient softw^ | 
Robert Wahbe, Steven Lucco, Thomas E. Anderson, Susan L. Graham 

December 1993 ACM SIGOPS Operating Systems Review , Proceedings of the 

fourteenth ACM symposium on Operating systems principles, volume 27 

Issue 5 

Full text available: jf bdf/1.49 MB) Additional Information: fujj. citatjon, abstract, references, dtinss, index 
^ v terms 
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One way to provide fault isolation among cooperating software modules is to place each in 
its own address space. However, for tightly-coupled modules, this solution incurs prohibitive 
context switch overhead. In this paper, we present a software approach to implementing 
fault isolation within a single address space. Our approach has two parts. First, we load the 
code and data for a distrusted module into its own fault do main, a logically separate 
portion of the application's address space ... 

20 On testing cache-coherent shared memories | 
Phillip B. Gibbons, Ephraim Korach 

August 1994 Proceedings of the sixth annual ACM symposium on Parallel algorithms 
and architectures 

Full text available: Wl pdfi 1 .30 MB> Additional Information: Ml .citation, abstract, references, citings, index 
^ '"' terms 

Sequential consistency is the most-widely used correctness condition for multiprocessor 
memory systems. High-performance shared memory multiprocessors such as the Kendall 
Square KSR1, the Stanford DASH, and the MIT Alewife employ a variety of techniques to 
improve memory system performance while providing sequential consistency. Primary 
among them is the use of caches at each processor, kept coherent by protocols 
implemented in hardware. We study the problem of testing shared mem ... 
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1. AUTOLOOP: Automated Action Selection in the "Observe-Analyze-Act" L 
Systems 

Li Yin; Uttamchandani, S.; Palmer, J.; Katz, R.; Gul Agha; 

Policies for Distributed Systems and Networks, 2005. Sixth IEEE International ' 

06-08 June 2005 Page(s):129 - 138 

Digital Object Identifier 10.1109/POLICY.2005.9 

AbstractPlus | Full Text: PDF(232 KB) IEEE CNF 



2. Smiley - an interactive tool for monitoring inter-module function calls 

Goldman, N.M.; 

Program Comprehension, 2000. Proceedings. IWPC 2000. 8th International W 

10-11 June 2000 Page(s):109 - 118 
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